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Abstract 

This paper considers M-estimation of a nonlinear regression model with multiple change- 
points occuring at unknown times. The multi-phase random design regression model, dis- 
continuous in each change-point, have an arbitrary error e. In the case when the number 
of jumps is known, the M-estimator of locations of breaks and of regression parameters are 
studied. These estimators are consistent and the distribution of the regression parameter 
estimators is Gaussian. The estimator of each change-point converges, with the rate n _1 , 
to the smallest minimizer of the independent compound Poisson processes. The results are 
valid for a large class of error distributions. 

Keywords: multiple change-points, M-estimator, random parametric regression, asymp- 
totic properties 

1 Introduction 

Change-points are intrinsic features of signals that appear in economics, medicine and physical 
science. The statistics literature contains a vast amount of works on issues related to the esti- 
mation of the change-point for a parametric regression, most of it specifically designed for the 
case of a single break. The more used estimators are the maximum likelihood estimators, the 
least squares estimators or a wider class, the M-estimators. Statistical inference for a paramet- 
rical model is influenced by the continuity or by discontinuity of the regression function at the 
change-points, but also by the determinist character or not of the explicative variable. We give 
a non-exhaustive list with the recent papers. The area of research is so active that it is nearly 
impossible to list all the recent papers written. 

For the least squares (LS) estimators we refer to Feder (1975a, 1975b) for continuous two-lines 
models, Lai et al. (1979), Yao and Au (1989) for a step function, Liu et al. (1997), Bai and 
Perron (1998) for multiple structural changes in a linear model. 

For the maximum likelihood (ML) estimator, when the design is determinist, Bhattacharya 
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(1994) discusses his limiting behaviour for a discontinuous linear model. Gill (2004), Gill and 
Baron (2004) consider a model where the canonical parameter of an exponential family gradually 
begins to drift from its initial value at an unknown change-point. For a random design we refer 
to Koul and Qian (2002) for two lines model, Ciuperca (2004) for a single jump in a nonlinear 
model, Ciuperca and Dapzol (2008) for multiple change-points in linear and nonlinear model. 
If the model variance depends of the mean, the quasi-likelihood estimator can be considered. 
Braun et al. (2000) consider that the mean is constant between two change-points. Chiou and 
Muller (2004) propose a semi-parametric estimator in a generalized linear model with determin- 
ist design. 

In the general case of M-estimators, Rukhin and Vajda (1997) consider the change-point esti- 
mation problem as a nonlinear regression problem, the model being continuous, with a single 
change-point and fixed design. Koul et al. (2003) study the M-estimators in two-phase linear 
regression with random design. 

The present paper makes several contributions to the existing literature. The considered design 
is random, the regression function is nonlineary within the framework of a multi-regime and 
not lastly, a general method of estimation. We study the properties of the M-estimator in a 
multi-phase discontinuous nonlinear random regression model with a general error distribution. 
The class of the M-estimators was introduced by Huber (1964) and its principal properties are 
exposed in Huber (1981). We generalize among others, the results for the two-phase random 
linear model of Koul et al. (2003) obtained by M-estimation, the results obtained by the ML 
estimation of Ciuperca and Dapzol (2008) for a multiphase random nonlinear model and of Bai 
and Perron (1998) obtained by LS estimation in a multiple nonrandom linear regression. An 
important point of the proofs for the linear case is the relation between the regression function 
and its derivatives with respect to regression parameters. Thus we have to modify the approach 
for the non linear regression. Also, in the case of a single change-point, each of two regimes 
has one fixed boundary. For multiple breaks, each middle regime has boundaries completely 
unknown. 

The paper is organized as follows. We give necessary notations and definitions in Section [2j In 
Section [3] we establish the estimators consistency and the convergence rate. Weak convergence 
results are also obtained: the asymptotic distribution of the regression parameters M-estimator 
is Gaussian. We also prove that n(#2n — #2) converges weakly to the smallest minimizer vec- 
tor of the independent compound Poisson processes, where #2n is the change-point estimator. 
Auxiliary results are given in Appendix. 

2 Notations and model 

Consider the step-function with K {K > 1) fixed change-points, for x € M: 
fe{x) = h ao (x)1 x < T1 + h ai (x)1 Tl 

<X<T2 
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where 9\ = (ao,ai, ....,uk) are the nonlinear regression parameters and 62 = (n, ...,tk), t\ < 
T2 < ... < tk are the change-points. For all k = 0, 1, -.-,K, we have the parameter a k belongs 
to some compact T C M d . We consider that the vector 62 G M K and we set 9 = (9\, 62) G ^ = 

Consider the random design model: 

Yi = fe(Xi) + Si, i = l,...,n 

where (si,Xi) is a sequence of continuous independent random variables with the same joint 
distribution as (e,X). The parameter 9\ and the change-points (or break points) are unknown. 
The purpose is to estimate 9 = (^1,^2) when n observations of (Y,X) are available. 
We denote the true value of a parameter with a superscript. In particular, 8® = (ccq, a®, oP K ) 
and #2 = ( r i ) ■■■i t k) are use d to denote, respectively, the true values of the regression parame- 
ters and the true change-points. Let be also 9° = {9®, Q®)- We suppose that 9® is an inner point 
of the set T K+1 . 

The random variables X and s satisfy the following assumptions: 

(Al) X has a positive absolutely continuous Lebesgue density ip on JR. Moreover, JE(X 2 ) < 00; 
(A2) s has a density absolutely continuous and positive everywhere on JR. Moreover, E(e) = 0, 
IE(e 2 ) < 00; 

(A3) the random variables X{ and £j are independent. 

In the case of linear model with a single change-point: h a {x) = a + bx, a = (a, b) and K = 1, 
assumptions (Al)-(A3) on X and e are also considered by Koul et al (2003). 

The nonlinear function h a satisfies the conditions: 

(Bl) for all x € M, h a (x) is three times differentiable with respect to a; 
(B2) for all x G M, \\dh ao (x)/da\\ ^ 0; 

(B3) the derivatives d 3 h a (x)/da 3 , exist for x G IR and there exist functions Fq,Fi,F2 G i 2 (</ ; ') 
such that: 

sup|Mx)| < F (x), sup||^7i Q (x)/^'|| < Fj(x), j = 1,2 (1) 

Obviously, in the case h a (x) = a + bx, the assumptions (Bl), (B2) are verified and (B3) is 
transformed in (Al). If h a (x) is a polynomial with degree p, assumption (B3) can be replaced 
by 1E(XP +1 ) < 00. 

Assumption (B2) is necessary for obtaining the convergence rate of regression parameters esti- 
mator. 

Let us consider the functions: d( Q , fe)Q , j )(x) := h ak (x) — h aj (x), x G M, k,j G {1,...K} and the 
jump at the true break point: d^, := ^( a o a o j^)- We make the identifiability assumption that 
the jump at each r° is non-zero: 

d {a k ,a k+1 ){ T k) / °> V «fc, Ofe+i G L, a k / a k+ i (2) 
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a condition which implies that the function fg is not continuous in the true break points for all 
parameters in T. For 6* = (91,0%) and 9 = (6>i,6> 2 ), let us denote by 6(e,e*)(x) := fe(x) - f$*{x) 

the difference between two models. Note also: fg(x) = dfo(x)/d9\. 

In the following, we denote by C a generic positive finite constant not depending on n. 
For a vector, let us denote by j.| the Euclidean norm and for a matrix A — (a,ij), — y~] I Q» I • 



For a vector v = (v±, ...,Vk) we make the convention that \v\ = (\vi\, \vk\)- 

The most important method of constructing statistical estimators is to choose the estimator 
to maximize or minimize a certain criterion function. The such estimators are called the M- 
estimators. The maximum likelihood (ML), least squares, least absolute deviation estimators 
are particular cases. For a function p : M — > M+, let the M-process be: 



The following assumptions are considered for the function p: 

(CI) p is convex on M with right-continuous non-decreasing almost everywhere derivative ip 
satisfying JE £ [t/j 2 (e + y)] < oo, Vy € JR. The function \(y) := JE E [ip(e + y)], y G M, is strictly 
increasing on M and A is continuous at with A(0) = 0. 
(C2) for all c € M, where fi is the closure of $7. 

(C3) the function y — > E[\ip(e + c + y) — ip(e)\] is continuous at 0, Vc G iR. 

(C4) the function A is differentiable in a neighborhood of 0, with derivative A' satisfying 

A'(0) + 0, and lim a ^ a _1 Jq |A'(s) - A'(0)|ds = 0. 

(C5) the random variables p(e ± d®) — p(e), \/k = 1, K, are continuous. 

Assumptions (CI), (C2) are necessary for obtaining the consistency of the estimators, while 
(C1)-(C5) are used for obtaining the rate of convergence and the asymptotic distribution. 
Notice that forthe two-phase linear regression function: fg(x) = (ao + box)TL x < T + (ai + bix)H x>T , 
Kouletal. (2003) consider the same assumptions (C1)-(C5). Obviously, (C2) becomes: lE( £ ^ X )[ip 2 + 



c\ + C2|-X"|] < oo, Vci,C2 G IR and (C5) becomes: p(e ± d) — p(e) continuous, with d = 
(a? - og) + r°(6? - 6g), (ag, 6g, a?, 6?, r°) the true value of (a , bo, a 1 ,b 1 ,r). 



,3 



n 



M n {9) = Y,P{Yi-fe{Xi)) 



i=i 



For each r/ > 0, denote the ^-neighborhood of ^ G £1 by: 



n,(0) := {r 



(9l,9* 2 )en I \\9* 1 -9 1 \\< v ,\\9* 2 -9 2 \\<r ] } 



The M-estimator is defined by 
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where is the close of 0. Let M = JRU {— oo, oo}. The set M is compact under the metric 
m(x,y) = \arctanx — arctany\, x,y € JR. For constructing the M-estimator, first we search 
the regression parameters estimator and then we localize the change-points. First, for a given 
6» 2 € M K , we set: 

M6> 2 ):=arg min M n (6 1 ,6 2 ) 

g ieT K+l 

Since the number K of the change-points is fixed, the estimator 9\ n {92) is constant in 02 over 
any interval of two consecutive ordered Xj's. The M-process M n (6i n (02),62) has only a finite 
number of possible values with change-points located at the ordered X^s. Second, we find the 
minimizer 02n of M n (#i„(#2), 62) with respect to 62 over the sample percentile {Xi, i = 1, n}. 
This minimizer may be taken as the left end point of the interval over which it is obtained. Then 
#2n = 2n and the M-estimator is: 6 n = ((9i n ((9 2 „), #2n)- 

Remark. The considered model and the estimator are very general. The class of M- 
estimators includes the least squares (p(x) = x 2 ), maximum likelihood (p(.) = log( ) o £ (.), with (p £ 
the density of e) and least absolute deviations estimators (p(x) = \x\). Examples of distributions 
satisfying these conditions include Normal for X, double exponential or Normal for the errors e 
if p(x) = \x\ a , a G {1/2,2}. 

For the ML estimator in a multi-phase nonlinear random model, the conditions imposed on the 
random variables are (Al), (A2), (A3) and the density (p £ of e satisfies: u(x) = (p' e (x) / <p e (x) the 
score function is Holder, is differentiable and u' is Holder also (see Ciuperca and Dapzol(2008)). 
The function h a satisfy the condition (B2) and: 

sup lE {£jX) [u 2 (e + f e (X) - fr(X))] < 00 
e,e*en 

3 Asymptotic properties 

In this section we focus on study of the asymptotic properties of estimator. First, we study the 
convergence of the M-estimator and we find the rate of convergence. 

3.1 Consistency and rate of convergence 

For each change-point r°, since the density of X is absolutely continuous in ]R, we have: 

n 

n ~ l £ h^-rl\<B/n = Opin- 1 ) (3) 
i=l 

It is interesting to mention that in a identifiable regular model for a density with jumps, the ML 
estimator is of order n _1 (see Ibragimov and Has'minskii (1981)). For the multi-phase problem, 
we obtain that the M-estimator of the change-point has the same order of convergence. Always 



5 



in a regular model, van der Vaart and Wellner (1996) obtain the rate of convergence of the 
M-estimator. 

The next theorem establishes the strong consistency of the M-estimator and shows that the 
rate of the convergence of 9 2n to 9 2 is n~ l and n -1 / 2 of Q\ n to 9®. The theorem includes the 
results derived by Koul et al. (2003) when h a is linear for the M-estimator and by Ciuperca and 
Dapzol (2008) when h a is nonlinear for the ML estimator. Remark that for the ML estimator 
in a nonlinear random model, the discontinuity of the function fe(x) in the change-points is not 
necessary to show the consistency of the estimators. 

In order to simplify the study of the rate of convergence, three processes defined as the differences 
between two M-processes are considered. The first one is the difference between a M-process 
calculated in a some point 9 and a M-process at the true point 9°: 

D n (6 1 ,e 2 ) :=M n (9 1 ,9 2 )-M n (9 1 ,9° 2 ) (4) 

For the second one, the regression parameters vary around 9®, for w\ G 

Di}\ Wl ) := M n (9° + n-VV, 2 °) - M„(0?, 9° 2 ) 

the coefficient of w\ being the rate of convergence of the estimator 9\ n and finally we make vary 
the change-points: 

L>1 2) (01,0 2 ) :=M n (9 1 ,9 2 )-M n (9 1 ,9° 2 ) 
The relation between these processes is given by the following decomposition: 

D n (e 1: e 2 ) = £#> (n 1 i\9 l - 0?)) +D^{e u e 2 ) (5) 



Theorem 3.1 (i) Under assumptions (Al), (A3), (Bl), (B3), (CI) and (C2) we have: 
9 n ^ 9°. 

n— >oo 

(ii) Under the assumptions (A1)-(A3),(B1)-(B3), (C1)-(C5), we have 

n\\e 2n -e° 2 \\ =O p (1), n l / 2 \\9 ln - 9^ = 0^(1) (6) 

Proof of Theorem 13.11 (i) To show the strong consistency of the M-estimator, we first prove 
that the function e(6) := JEr e ^c) [\p(Y ~ fe(X)) — p(Y — f e o(X))\] is continuous. By the mean 
value theorem, we have: 

\p(Y - f e {X)) - p(Y - f e o(X))\ < / \${e - v)\dv (7) 

J o 
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Then e{9) < oo, for all 9 G O, whence the function e(0) is well defined. The result of uniform 
convergence of e(9) given by Lemma 14.21 and e{9 ) = imply that the function e is continuous 
on Q. By ([2]), we have that e(0) ^ for all ^ 9°. Then, we can apply a similar method to 
that in Huber (1967) and we obtain the strong consistency of the M-estimator. 
(ii) Since 9 n is strongly consistent, it suffices to suppose 9 in a £- neighborhood of 9°. For a 
positive constants b and g, which will be later determined, let be the sets of parameters: 

V lbg := [9 G n e (9°); n 1 ' 2 ^ - 9° x \\ > b] , V 2be := {9 G n e (9°); n\\9 2 - 9° 2 \\ > b] 

The theorem is proved if we show that: for any 7 > 0, c G (0, 00) there exist 6 € (0, 00) and 
n cyb G IV such that: 

JP[ inf D n (9 x ,9 2 ) > c] > 1 - 7, Vn > n nb , j = 1, 2 (8) 

where Vjbp is the close of Vjbg- By relation ((5J), we have, for j = 1, 2: 

inf DM,62) > inf £)W (n 1 / 2 ^ - 0?)) + inf D^{9x,9 2 ) (9) 

• The study of Dn is simpler because it involves only the regression parameters: 



inf (n 1 / 2 {9 1 -9° 1 )) =mm{ inf D^( Wl ), inf d£\ Wi )} (10) 

6>GV 2f ,e V ' l|wi||<6 l|wi||>6 

For w\ G r^ +1 , since p is convexe, there are 6 > such that inf D n l \wi) is greater than 
CEti^ffsJ+n- 1 / 2 !!)!,^)^^'))' f° r ll^ill > ^- Assumption (B2) and the convexity of p imply 
that for all 7, 71 > 0, there are 61 > 0, (61 > b) and n 7 G IN such that: 



TP 



inf D$P(wi) > 71 

\wi \\>bi 



> 1-7/2, Vn>n 7 (11) 



Using relation (jllH and the approximation of D„ given in Lemma 14.41 we obtain that the 
minimum of ()10|) is Ojp(l). 

On the other hand, for 9 G Vi&p we have with an arbitrarily large probability, for n large: 
M ee y Ug D^ (n 1 / 2 (9 1 - 91)) = inf|| TO || >6l ^(tfli). Relation ^ implies that: 

infg g y Dn^ {n 1 / 2 {9\ — 9\)) is arbitrarily large and positive with a probability close to 1. 

(2) 

• We take charge now the study of Dn . For any positive numbers b and g, we prove that 

(2) 

inf 0e y D n (9i,9 2 ) = Ojp(l)), using the decomposition: 

inf £><?>(0i,0 2 ) =min{ inf D^{9 X ,9 2 ), inf £>^(0i,0 2 )} 

96Vi 6e eev li)e nv 2b!? 0eV Ue nv^ be 
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with V| 6 = {9 G fl Q (9°), n\\02 — ^11 — Taking into account the convexity of p and the 
approximation on given in Lemma 14.61 we obtain: 



inf D^(9 1 ,e 2 )= inf ( 0? + n" 1 / 2 ^,^ + = P (1) 

6ev lbe nv° be [|t[l<6,||wi||=6 V - J 

,(2) 



Consider D n for 8 in V^g- By Lemma l4.5( for all positive numbers 7 and c, it exist 72, &2 £ 
(0,oo), q G (0,1), and n2 € iV such that: 7262 inffc <£>(t°) > 2c and that relation ((2*2]) is true. 
By (Al), we choose g G (0,1) sufficiently small such as mi T o <x<T o +g g(x) > g(r®)/2 for all 

k = 1, ...,-fT. Then, for n > 62/^ we have: inf T o <:r<T o +fe2n -i g{x) > g(r^)/2, for all k = 1, 
and: 



2P 



inf D^{e u 9 2 )> 



> iP 



> 1 
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for n > ri2 (12) 



Hence the second term of the right-hand side of inequality ([9]) is arbitrarily large with arbitrarily 
large probability for sufficiently large n. 

• In conclusion, we showed that for every set Vib 2g , V 2 b 2 e^ the right-hand side of Q is the sum 
of Ojp(l) and of arbitrarily large random variables. This implies relation ([5]). 



3.2 Asymptotic distributions 

We mean now giving the limiting distribution of the M-estimator and an asymptotic approxi- 
mation for the M-process. 

Let us consider t G 1R* K and w\ G r^ +1 . For D n defined by ([""} as a process in the standardized 
parameters, we have the following decomposition: 

Dn^ + n-^w^ + n-H) =D^( Wl ) + D^ (9^ + n-^ 2 w 1 ,8^ + n- 1 t) (13) 



Let us denote Vq := Ex 



f e0 (x)f e0 (xy 



the Fisher information matrix corresponding to the 
random model in X. We suppose that the matrix Vq is inversible. 

The M-process is rescaleted in D n with regard to the rate of convergence. Let us consider the 
random vector: 

n 

Z n :=n- 1 / 2 Y,feo(X l )He i ) 
i=i 

Let D(—oo, 00) be the set of all cadlag functions on (— 00, 00) with the Skorokhod topology. 

The next theorem gives the joint asymptotic distributions of the M-estimators. In the asymp- 
totic behaviour of regression parameters estimator, the independence of error e and of regressor 



8 



X intervenes in an essential way in variance formula. Also for 9\ n , the asymptotic approximation 
expression is similar to that of the M-estimator in a model without break. On the other hand, 
the asymptotic distribution of the change-points estimators depends only on the density of X 
in the true break points and on the difference p{e ± d° k ) — p{e). 

Theorem 3.2 Under assumptions (Al)-(AS), (B1)-(B3), (C1)-(C5), we have 

n l '\9 ln - 61) = [A'(0)] -1 V^Z n + ojp(1) (14) 

Moreover, (n l ^(9 ln - 09), n(9 2n - 05)) (Z,HJ), with 

Z ~ Af(K+l)d (Oj^e [V ;2 ( e )] -^'(O) -2 ^) -1 ) a Gaussian random vector independent o/n_ = (IIx_, ...,Hk~), 
n fc _ = &Tgmm tkE M'Pk(tk), where: 

V k {t k ) = V kl (t k )l tk > + V k2 (-t k )l tk < (15) 

V k \ and V k 2 are two independent compound Poisson processes on [0, oo) with rate ^p(t^) and 
T'fci(O) = V k 2(0) = 0. The distribution of jumps is given by: p(e + — p(s), respectively 
p(e-dl)-p(e). 

Proof of Theorem 13.21 Using the approximation results obtained in Lemmas 14.41 and 14.61 and 

also the decomposition: M n (0? + rT^wx, Q\ + ttH) = M n (0%, 9fj+D n (0$> + n" 1 / 2 ^, 0% + n~H) , 
we obtain an asymptotic approximation for the standardized M-process as the sum of two pro- 
cesses. The first is quadratic form Q n (w\) in the standardized regression parameters, the second 
is a empirical process in the standardized change-point parameters: 

Afnf^ + n-^V^ + n -1 *) =Qn{wi)+D^{e^el + n- l t) + o ]P {l) (16) 

where: 

Q n ( Wl ) = M n (9i9° 2 ) - n-^wi^fgoiXiWei) + -^-w{V QWl (17) 

i=i 

Let us remind that ip is the derivative of the function p. 

For t = (ti, Ik) G M* k , w± G r^ +1 , by relation (|16p we have that the minimum of 
M n (0® + ra -1 / 2 ^!, r + n _1 i) with respect to (wi,t) is equivalent with the minimum with re- 
spect to w\ of Q n {wi) and with respect to t of Dn^ (#l\#2 + n~ 1 t). Then relation (TT4"1) results 
from (|17p . Relation (|14p implies that the study of the distribution limit of n l l 2 (9\ n — 9®) amount 
to study the law limit of Z n . But taking into account (CI), by a Central Limit Theorem, Z n 
converges in distribution to the gaussian distribution: NfK+i)d (0> Vo]E e [ip 2 (e)]) ■ 
In view of Theorem 13.11 (ii) for the change-point estimator, we have: 

n{6 2n - 9° 2 ) = arg min (6%, 9° 2 + tTH) + o P (l) 



9 



(2) 

For study jointly the distribution of 2 n and of Z? n we apply Theorem 4.2 of Koul et al. (2003) 
for f n (X,e) := f e o(X)ip(e) and h n (X,e) := p(s + d ( o a o \(X)) — pie). Note that: 

1 ' V. k ' k — 1 ' 

hand, for £ n (x, z) := 1E £ [exp 



mm{T°T°+t k /n}<Xi<max{T°T°+t k /n}- 



On the other 



in 



-1/2 J, 



f n (X,ej) \X = x] we have: 



|n(l -£ n (x,z))\ < 



dh a (x) 



da 



'/«*(*) <CIE £ [^ 2 (e)]|M| 2 sup 

J a 

By assumptions ([T|) and (Al), we obtain that n (1 — £ n (x, z)) is uniformly integrable with respect 
to dH(x), where H is the distribution function of X. Thus n (1 — £n(^> z)) — > [?/> 2 (e)] z*A(a;)2;, 

with: A(x) := / e o(x)/Jo(ic) and A := V = iE?x[A(X)]. Whence: 

^.f^.^ + n-^) (Ar ( K+i )d (0,^ £ [^ 2 (e)]) ,V(t)) 



n— >oo 



iniR( E " +1 ) d xD(-oo,oo)^ withP(t) := Ef=i Wfc)- The random vector M {K+1)d (0, Fo^elV' 2 ^)]) 
is independent of V^, k = 1, K. 

We prove now that n{9 2n -0l) converges weakly to the smallest minimizer H_ of the process V 
and show then that the components of this vector coincide with the minimizer of Vk(tk), with the 
probability 1. Seen the Skorokhod space definition, D(— oo,oo), we consider that change-points 
vary in a compact of 1R K . 

We consider the M-estimator of the change-points: 9\ n := argmin te [_ 66 jic M n (j)\ n (t),t\ and 

the minimizer of V(t): LT^ := argmin tg [_ fe mk V(t), for a fixed b > 0. By Theorem 13. 1\ there is 

a real number b < oo such that §2 n — Q\ n —> a.s. for n — > oo. More, it also exists a real b < oo 
such that Il_ = ITi with a probability arbitrarily large. 
Then, we shall first prove that for all b > 0: 



n(6 



2n 



it 



(18) 



For t G [— b,b] , b = (b,...,b) a K-vector, we consider the random process V (t) := ^{t)!.^^ 

lij^j. Let also, for v € M, 



and: Af*(i) := |M„ (flm^ + n~H),e% + n" 1 *) - M n [6 
the random process: 



'In ^2)) ^2 



H k n (v) 



[pfa + sign(v)d° k ) - p{e l )] l min ( T o iT o +n -i^ <Xi < max ( T o >T o +n _i„j 

D£°(0§ + n- 1 t)-.H n (t) 



i=l 



and theirs sum: H n (t) = Y2k=l ^n(*&)- So by (C3), ]E^ £ _ X ) 
is bounded to upper by 

A' 



su P||t||<6 



fc=1 i|x-r0|<n-l6 



^(x)iE £ 



p e 



+ d K,«t 1 )( x )J - P ( e + sign{t k )d° k ) 



dx 
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K f r 

n J2 <fip)E R \d, a o o ) (x)-d k \\ip(e + y x ) 

K 



(fx, with y x — > 0, for x — > 



< 



fc=1 -/|x-7£|<n-il 



s - T k \ I SU P 



dh a (x) 



da 



cp(x)JE E [ip(e + y^)] <ix 



But < C and JE £ [ip(e + y x )] < C as a continuously function on a compact. Then, by the 
Cauchy-Schwarz inequality: 



IE, 



(e,X) 



sup 

||t||<6 



D®(0° + n-H)-H n (t) 



K 

<cuy: 

k=l 



(x - T?,) 2 dx 



x-T^\<n~ 1 b 



1/2 



O(l) 



Hence: supiu||< 6 



ojp(l). Let us consider: IT* = argmin te r 



-b,by 



H n (t). By 



M b n (t)-H n (t) 
Lemmas 4.3 and 4.4 of Koul et al. (2003) we obtain: 

n 02n ~ e l) - n n 0, II* U b _. Then relation (TJHD follows. Because for two dif- 

n.^oo n— >oo 

ferent change-points we have to make of two independent sets of random variables we have 
that: argmin tg [_ fe;b ]A- H n (t)) = Ylk=i ar § m ^ n t k e[-b,b] H*(tk))- The last relation, with (fl~8l) and 



ip 



IP1, imply that the asymptotic distribution of n( 



1 2n 



is n_ 







Remarks. 1. For K = 1, we find the results that the empirical processes Dn^ (^j 1 , 9® + n~ l t\ 
converges to a compound Poisson process. We also find all the asymptotic distributions for 
particular estimators ML, LS, LAD. Particularly, the asymptotic variance of the ML estimator 
of the regression parameters is: JE £ [(ip' e (e)/'ip £ (e)) 2 ]VQ, with ijj £ the density of e. 

2. Consequence of Theorem 13. 2\ we can find the confidence interval or make hypothesis test for 
the parameter 9. 

3. The discontinuity in the change-points of the regression functions influences the rate of 
convergence of the change-point estimator. The proved results are differently from those in the 
continuous or discontinuous in the change-points for non-random design cases. For example, 
Van der Geer (1988) prove that in the uniform non-random design two-phase, discontinuous, 
the limiting distribution of the change-point estimator is determined by a Brownian motion with 
a linear drift. Rukhin and Vajda (1997) for a continuous model prove that the M-estimator of 
the change-point is asymptotically normal. 



4 Appendix: Lemmas 

To begin, we state a elementary lemma. 
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Lemma 4.1 For any k random variables Z\, ...,Zk the following inequalities are valid: 



1)< P 



1=1 



J> <0 



vi=l 



<^iP[Z. 4 <0] 



1=1 



The following lemma of uniform convergence will be useful in the proofs of the main theorems. 



Lemma 4.2 Under assumptions (Al), (Bl), (B3) and (C2), we have 



lim E t 

7}\0 



(e,X) 



sup \p(Y-f e (X))-p(Y-fe*(X))\ 
*en„(e) 



Proof of Lemma 14.21 We apply a version of the mean value theorem: 

p(Y- f e (X)) -p(Y- f e *(X)) = S (er) (X) [ ij)(Y — f e (X) + vSy^X)) dv 

Jo 



We begin by showing that: 



sup Sfe,o*)( x ) 



Regarding the change-points, there are two possible cases 
Case 1. r fc e H,Vk = 1, K. We have: 



sup \5 {e ^ ) {X)\<C 
9*en,,(6») 



sup — 



+ 2 SUp |/la(X)| X] 1 l^-^l<»? 
k=l 



(19) 



(20) 



Furthermore IP [\X — Tk\ < ?/] — > for 77 — > 0. Then, with condition ([2]) we obtain (|20p . 

Case 2. n = — 00 or tr- = 00. Without loss of generality, we consider t\ = —00. Obviously 

t\ ~>t\. We have |t* — t\\ < 77. Then: 



sup |<5 (W (X)|<C 

9*6^(6*) 



9/i Q (X) 
sup — 



A' 



+ 2sup|/i Q (X)| Ux^+^V-n^ 

V fc=2 / 



But IP [X < t*] — ► for 77 — > 0. Using assumption (B3) and the Cauchy-Schwarz inequality, we 
obtain relation (j20|) . 

On the other hand, using the inequality: Vrc € iR, |^(x + e)| < \ip(s + jx|)| + \ip(e — \x\)\, we 
have, by the Cauchy-Schwarz inequality: 



JE{e,x)[ sup \6 {e ,o*)(X)\ \^{Y - fe{X)+v5 {e ^ ) {X))\dv} 
e*en v {0) Jo 
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<CIE l ^ x) [ sup ^ (x)]jK i/2 r gup ^ {e + i/2\5 {dfi , ) {X)\ + \5 [e , eQ) (X)\)] 
(9*ea,(6») ' 0*g^,,(0) 

+CEf U sup ^ (X)]^ 1 / 2 [ sup ^(e-l^l^.jCXJI-I^CX)!)] 
e*en,,( ) ' e*en v (e) 

The conclusion results from relations (|19|) . (|20|) and from assumption (C2). 

For x, z £ M, t € 2R , for each fe = 1, if, let be function: Uk(x, z) := p(z + sgnfa — T^)d^ a o^ 

p(z). Let be the function: pk(x) := M e [uk(x,e)]. For each break point r° we count the number 
of Xi which fall into the interval (t°,t° + \uh\), with it = (u\, ...,uk) € iR . Let us consider 
the functions G fc , G Kn : M* -> (0, 1], where iR* = iR \ {0}: 



G fc ,„(u fc ) :- n 1 1 min(rO,rO+ Ufc )<X i < 



max(rJ!,T t +in) 



and its expectation: Gfc(u fc ) := iE x [l min{T o iT o +Mfc)<Xi < max(T o T o +Ufc) ]. For all if change-points 
we define the functions G, G n : M* K — > iR + , 

G(u) := G k (u k ), G n (u) := G k , n {uk) 

k=l k=l 

We present a lemma that states an important property for this functions. 

Lemma 4.3 Under (Al), for each j > 0, n > 0, there exists a constant < B < 00, such that 
for all b e (0, 1), and n > [B/b] + 1, 

„r ,G„(U) 

JP[ sup |-^f-l| <??] >l-7 

B/n<||w||<6 Lr \ u ) 
JP\ sup I *" 1 l|<7?]>l- 7 

B/n<\\u\\<b ^\ u ) 

where Z^ n '■ iR* — > iR, k = 1, K is defined by: 

Z ki( u k) '■= n_1 Ei=l [Pfc(-Xi) - ffcPQ.e.)] l min(r o iT o +Ufc)<Xi < max(r o iT o +Mfe) 

Proof of Lemma 14. 31 The proof is similar to that of Lemma 5.1 of Ciuperca and Dapzol (2008) 
using the results for a single change-point (see Lemma 3.2 of Koul and Qian (2002)). 

Let us now given an approximation of the M-process in 6° in the direction of the parameters 
of regression. 
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Lemma 4.4 Under the assumptions (Al)-(AS), (Bl), (B3), (CI), (C4) and (C5), for each 
b £ (0, oo), we have: 



sup \D^\ Wl ) +n- l / 2 w\ Y,\S#>(Xi)1>&)] ~ -¥-w\VqWi\ = o P (l) 



(21) 



lltui ||<6 



i=l 



Proof of Lemma 14.41 Using (CI), (C4) and (C5), we have that: Dn\w\) is equal to: 

n 
1=1 

1 " r 12 

i=i 

Since A(0) = and by the assumption (pQ) we obtain: 



l * 

n-VV^/eo^V^) - -n-^i^/go^)/^^)^^^) } (1 + <*,(!)) 



i=l 



i=l 



with Oip(l)) uniformly in w\ and n. Thus, using (A1)-(A3) and (C4), by the strong law of large 



• t 



numbers for ^/ e o(Xj)/ 6) o(Xj)^ / (ej) and by the assumption (fT]) for h a , we get: 



i=l 



D { n\wi) = {-n- l l 2 w\f e0 {Xi)il){ei) - ^L Wi Vqw\}{1 + ojp(1)). Thus the proof is complete. 

In the following lemma, the set V2b 2 g is defined in the proof of Theorem 13.11 The proof of 
the similar result for two-phase linear model of Koul et al. (2003) is facilitated by the existence 
of a single change-point and especially by the linearity in x of h a (x). 

Lemma 4.5 Under the assumptions (H|), (A1)-(A3), (Bl), (B3), (C1)-(C5), for all positive 
numbers 7 and c, it exist 72, &2 € (0, 00), g G (0, 1), and ri2 € IV such that: 7262 hrffc vK T fc) > 2c 
and i/iai: 



. f j?i 2) (0i,fl 2 , 
lnf ^7777; TTnTT > 72 



iG(\e 2 -e° 2 \) 



> 1 - 7/2, Vn > n 2 



(22) 



Proof of Lemma 14.51 Let us introduce some notations for ease of exposition. For each change- 
r° 



point tP, consider the processes: 



S kl( 9 ^ U k) : = n 1 Ei=l [pfa + d K,a fc _!)( X i)) - Pfe + ^aO.ag.jTO) 



1 



min(T^,r^+« fe )<Xi<max(r^,T^+« fc ) 



S kl( d ^ U k) : = n 1 E"=l[p( £ i) - P( £ i + d ( a o, afc )(Xi))]IL min(T )T o +Mfe)<Xi < max(r O iT o +Ufc) 
and the functions: Zu~ : M* — > iR, fc = 1, X: 
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: = n 1 EiU [Pk(Xi) - r fc °(r fc )] l min(T o iT o +ttfc)<Xi < max(r o iT o +Ufe) 

Let us consider 62 = O^+u with u = (u\, ...,uk). Given these notations, we see that n~ 1 Dn\0\, 62) 
can be written as: 



(23) 



n- 1 D { *\e u e 2 ) = Ek=xPk(^)G k {u k ) + Ek=iPk(^) [Gk,n(uk) - G k (u k )) 

+ Ef=i [z ( k % k ) + zg(«*) + s£i(*i,u*) + sg(0i,u*) 

We shall prove that the supremum on the set V2b 2 g °f an terms on the right-hand side of (i23j) . 
except the first, divided by G(|#2 — 0§ I) ^ s °p(^)- On the other hand, we prove that the first term 
is strictly positive with the probability 1. Remember that A(y) = JE E [p'(e + y)}. By Pubini's 
lemma, by (A2), (CI) and ([2]), we obtain that: 

/•max(0,dg) 

p fe (r fc ) = 1E £ [p(e + d° k ) - p(e)] = / X(z)dz 



mill 



Since the function A is strictly increasing and A(0) = 0, we obtain: p k (r2) > 0, for each k = 
1,...,K. For all ii fc < g, k = 1, K we have: |Z^(u fc )| < sup < w < e \p k (r% + v) - p k (r%)\ G k ^ n {u k ). 
By assumption (CI), for n — > 00 and g — > 0, we have: sup B / n<u<£1 |pfc( r fc + f ) — PfcC^ ) | = ojp(l) 
and with Lemma 14.31 for all g > 0, there is a i?i > such that for u k G (B/n, g), k = 1, 
and for n > B\/g, we have: < G kjn (u k )/G(u) < G n (u)/G(u) = 1 + ojp(1). Hence: 

sup -— = Oip(l) for n ->■ 00, g \ (24) 

Bi/n<||u||<£) (j "l n J 

We have a similar relation for z21, for a -B2 > and n > B 2 /g. 

For by Theorem I3.1f i). for all x £ M and for all 8 G V 2 b e we have: |d( a( ._ liQ o )(a;)| < Cg, 
whence: 



U k )\ < ^ ^ / \lP( £ i + d (a°,a° x)^) + W )l^ 1 min(rO,rO+n fe )<X l <max(r«,r»+ Ufc ) 

i=i ■ / - £7 e 



Applying Lemma 3.2 of Koul et al. (2003) for: J(x, z) = f^Q g \ip i^z + d^ a o a o ^(x) + l^- 
obtain that there exists a B3 6 (0, 00) such that for n — > 00, £ \ 0: 



v we 
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Then, for all 7 > 0, rj > 0: 

K (1) 

JP[ sup -— < i] }> P[ sup > — < 77 > 1 - 7 

A similar relation holds for ^ S k n {0\, u k ) for a -B4 > 0. Then, for I = 1, 2: 

fc=i ' 

sup fc ~' r =ojp(1), for 7 >0,£\0 (25) 

eev 2S;+2e G W 

On the other hand, for each 77 > 0: 
if 

/pr_^i / nl > P rV |Gt -" W " Gt(ut)l <- ^ 1 

[ G{u) <m ~ ^ G k (u k ) max fcft (r°) J 

>j 2p[ \GU^-G k (u k )\ < V K _ 

the last inequality is obtained by Lemma |4,11 By Lemma \4.3\ for each 77 > and 7 > there 
exists a B5 > such that the probability which intervenes in the last inequality is bigger than 
1 — 7. Choose 7 = 7/(4i\T) and 77 < max k pfc(r°)[4 + l/(i\TmaxfcPfc(T°))] _1 . We obtain in- 
equality ([22]) for 72 = [max fc p fc (T-£) - 77(4 + 1/(K max k p k (T®)))]/2, b 2 = max{B 1: ...,B 5 } and 
n 2 = + !• 

Following result gives the behaviour of Dn^ in a n -1 / 2 -neighborhood of 0®. 

Lemma 4.6 [Meter assumptions (Al)-(AS), (Bl), (B3), (C2), if we define 

A n (wx,t) :=D$ (9 i ( + n- 1 / 2 w 1 ,e% + n- 1 t)-D { n ) (6%,0% + n^t) , we have, for every be (0,oo): 

sup \A n (wi,t)\ = ojp(1) (26) 

||(wi,*)||<6 

Proof of Lemma 14 . 61 Without loss of generality, we consider the vector t = (pi, tic) G M*^ . 
The general case is obtained by very similar arguments. Let us note w\ = (wi,o 5 "W>i,i, ■■■,wi,k) £ 
Y K+1 . First, observe that we can write 

« K+ 1 _ 1/2 (Xi) 

A n ( Wl ,t)=J2J2 / fe_1 " ^i + Kl{Xi) + v)dv\ TOk<Xi < Tl+n - Hk 

i=i fc=i ^° 
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n K+l r d , (Xi) 

"LL ^(^+^)^ 1 rO<X i <rO+n-i tfe 
i=l fe=l J0 

Let us denote: 

B(w-i_,t) := A n (w x ,t) + B 1 (w 1 ,t) - B 2 (w 1 ,t) (27) 

with 

n K 

B 1 {w 1 ,t) :=EEil 1 +»- 1 ' 2 »i, t -i,«l 1 )( I «^( £l )\°<^<^n- 1 (. 

i=l fc=l 

and 

n K+l 

B 2 ( Wl ,t) d (a«+n-V2 ?i , lifcia ) (X i )^(ei)l T -«<X l <rO+n-li fc 
i=l fc=l 

By assumption Jl]) for dh a /da we have that: \d/ a o +n - 

1 / 2 w 1 h -i,a°_ )( X i)\ ^ n l/2U i witn ^ a 

random variable such as Ex(Ui) < oo and there exists a real ci^ > such that IP [Ui < c\^\ > 
1 — 7, for all 7 > 0. 

Since \h a (Xi)\TL o <x . < o, ~i t < C with the probability 1, we have that TE< e X ) [\B\(w\, t)\] 

k — 1 k 1 — k k ' 

is bounded to upper by : 

II 

A- - I 

Since e and X are independently and by the relation @: 



^T°<Xi<r°+n-H k 



,\ ' {M £ [\i>(e + C + v)-i>(e)\}+M £ [\i>(e-C + v)-ip(e)\}}dvY / lEx 

JO ,. : 

ince e and X are independently and by the relation @: 
E i£ ,x)[\Bi(wi,t)\] <o(l)nZk=iGk{n-H k ) = o(l). Alike: 

IE^ x) [\B 2 {w u t)\)<n / iE £ [|^(e + «)-V(£)|]^y"G ! it (n- 1 t fe ) = (l) 

^ fc=l 

Therefore, we have 

= o(l), TE iE> x)[B2iwx,t)} = o(l) (28) 

Let us note the random process: := B(w\, t)— A n (w\, t) = B\ (w\ , t) — £?2 , t) . Then, 

we can write: 

n K+l 

Dl(w U t) = ^ 5Z^K_ 1 +™- 1/2 ^l,fe-i,^_ 1 )( Xi ) ~ d K+n-V2 ?i , lifc , a O ) (Xi)}V'(ei) 1 r fe <X l <rO+n-li fc 
i=l k=l 

Using assumption ([T]) for d 2 h a /da 2 we obtain: 
D 1 (wi,t)=n I 2^2^[wi,fc-i ^ «>i,fc ]^(£i)l T o <x .< r o +n -i t) , + 0JP (1) 

j=l k=l 

(29) 
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Finally, by assumption (C2) for c = and by Q, we obtain that Mb € (0, oo), Vu>i € M K+1 : 
" ^ dh a o (Xi) dh a o{Xi) 

SU P IIZ^Z^Kfc-l ]^( £ i) 1 r fc °<X i <rO+n-ltJI = Oip(l) (30) 

0<||t|]<6 i=1 fc=1 a « a « 

The conclusion follows from the relations (|27 |) . ([25j> . (|iZ9" ]) and (|BT) ]) . 
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